Arabic text categorization: a comparative study of different representation modes

نویسندگان

  • Zakaria Elberrichi
  • Karima Abidi
چکیده

The quantity of accessible information on Internet is phenomenal, and its categorization remains one of the most important problems. A lot of work is currently focused on English rightly since; it is the dominant language of the Web. However, a need arises for the other languages, because the Web is each day more multilingual. The need is much more pressing for the Arabic language. Our research is on the categorization of the Arabic texts, its originality relates to the use of a conceptual representation of the text. For that we will use Arabic WordNet (AWN) as a lexical and semantic resource. To comprehend its effect, we incorporate it in a comparative study with the other usual modes of representation (bag of words and Ngrams), and we use different similarity measures. The results show the benefits and advantages of this representation compared to the more conventional methods, and demonstrate that the addition of the semantic dimension is one of the most promising approaches for the automatic categorization of Arabic texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

A Comparative Study with Different Feature Selection For Arabic Text Categorization

Feature Selection benefits a learner by eliminating non-informative or noisy features and by reducing the overall feature space to a manageable size. The Term Feature Selection is used in Machine Learning for the process of selecting a subset of features used to represent the text. In this paper, we propose a new approach for Text Representation based on incorporating background Knowledge Arabi...

متن کامل

A comparative study of the text inside the Mihrabi rug by Zareh Penyamin and Topkapi Palace Museum according to the existing discourse in the 16th and 19th

IIn the country of Turkey, in the city of Hereke, at the end of the 19th century, rugs known as Mihrabi became popular, which were inspired by the rugs of the Safavid era and kept in the Topkapi Palace Museum. In these rugs, which are reproduced in royal workshops on a large scale, some changes have been made in the verbal text and incorporated visual elements. Among the rugs that seem to have ...

متن کامل

A Comparative Study in Relation to the Translation of the Linguistic Humor

Mark Twain made use of repetition and parallelism as two comedic literary devices to bring comic effect to the readers. Linguistic devices of humor, repetition and parallelism seemed to create many difficulties in the translation of literary texts. The present study applied Delabatista‟s strategies for translating wordplays such as repetition and parallelism in the translation of humorous texts...

متن کامل

New stemming for arabic text classification using feature selection and decision trees

In this paper we conduct a comparative study between two stemming algorithms: khoja stemmer and our new stemmer for Arabic text classification (categorization), using Chisquare statistics as feature selection and focusing on decision tree classifier. Evaluation used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, middle east...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2012